perm filename CHAP3.1[DIS,DBL] blob sn#208269 filedate 1976-03-26 generic text, type T, neo UTF8
.SSEC(AM as Heuristic Search)

.SSSEC(Heuristic Search)

As the title of this section -- and this thesis -- proclaims, AM is a
kind  of  "heuristic search"  program.   That  must mean  that  AM is
exploring a particular "space," using some informal evaluation criteria
to guide it.

Heuristic gourmets sometimes find it useful to classify heuristic searches in the
following way (depending on their flavor):

Spumoni:
The first kind of search paradigm is  where each
operator merely adds more knowledge to  a central base of facts.  For
example,  in theorem-provers, each  operator is a  rule of inference.
Applying an operator adds one more wff to the list of known true statements.
So in some sense ⊗4all⊗* paths lead to a "goal", the problem is to find a
short, direct path. A  solution is "good" if it consists of a very short path
leading to an acceptable goal node.
The quality of the path is very
important.

Rocky road:
The  second   flavor  of  heuristic  search  is   that  of  a  guided
tree-search. For example, consider the typical chess program.   After
applying an operator (like "castling"), a new  node is created (a new
hypothetical  board configuration), but  if it is  necessary to "back
up", then virtually  none of the  new nodes along  that path will  be
added to the system's central knowledge  base.  The task really is to
find ⊗4any⊗* path leading to the goal. A  solution is "good" if it consists of
a path leading to an excellent goal node. The quality of the path is not very
important.

.SSSEC(AM's Similarities to Heuristic Search)

AM does the former kind of "self-adding" exploration. It contains, at
any  moment, a  collection  of  partially-understood concepts.    Its
activities   include  understanding   those   concepts  better,   and
occasionally defining new concepts.   AM thus continually enlarges  a
body of mathematical knowledge. AM is spumoni-flavored.

If forced into it, one  could say that the "nodes" of  AM's space are
its   partially-filled-out   concepts   (which  include   structures,
operators, and interesting relationships between other concepts), and
AM's "operators" are  all its  tactical rules  for filling  out those
concepts,  and for  creating  new concepts. AM's "heuristics"  of the
heuristic search paradigm  are its remaining  rules:  the
ones which  guide the system at  a strategic level. 
So we're forced to draw a line, separating AM's guiding rules into two
separate categories: trivial tactics versus high-level strategies.
This is painful and artificial, since in reality the rules are distributed
fairly evenly along a continuum.

.ONCE TURN ON "{}"

It would be nice to say that a "node" is simply what AM calls a concept,
but unfortunately
AM is  complicated  by the  fact that  the process  of "creating  and
filling out a  new node" would then be very time-consuming. It's better to
only partially fill out  newly-created nodes.   In fact, the task  of
deciding which nodes to fill out a little more -- and precisely which
facets  of them to worry about filling out  -- is itself handled as a
heuristic search (via the agenda mechanism.   See Sections {SECNUM}.3
and {SECNUM}.4). The standard heuristic search paradigm treats
"growing a new node" as a primitive process.

.SSSEC(AM's Differences from Heuristic Search)

There are  several difficulties and anomalies in  forcing AM into the
heuristic search paradigm.  For example, the operators which AM  uses
to  enlarge  and  explore  the  space   of  concepts  are  themselves
mathematical concepts (like  composing two relations to produce a new
one). Thus AM should be viewed as a mass of knowledge  which enlarges
⊗4itself⊗* repeatedly.  As far as  I know, all previous  systems kept
the  knowledge they "discovered" distinct from  the knowledge used to
make new discoveries$$  Of course this  is typically because the  two
kinds  of knowledge  are very  different: For  a  theorem-prover, the
first kind is "true predicate calculus statements", and the second is
"strategies for applying rules of inference". $.

Another difference between  AM and other heuristic searchers  is that
AM has no well-defined goal criteria. Its sole aim is to preserve the
interestingness level of the activities it performs, of the top tasks
on the agenda.  We don't care what AM does -- or misses -- so long as
it  spends its  time on  plausible tasks.  There is  no fixed  set of
theorems that  AM should  discover (AM  is  thus not  like a  typical
⊗4problem-solver⊗*),  no fixed set  of traps  AM should avoid  (AM is
thus very different from  ⊗4game-players⊗* like chess programs).  

For
example,  there is  no stigma  attached  to the  fact  that AM  never
discovered  real  numbers$$
There are many "nice" things which AM didn't -- and can't -- do:
e.g., devising 
↓_geometric_↓   concepts from its initial   simple   set-theoretic
knowledge.   Paul Cohen has  indicated that
this would  be  a supremly  exciting  development: the  invention  of
geometry -- and  all the tools it provides  -- by a system  which has
neither  geometric  intuition  built  into  it,  nor  definitions  of
geometric concepts. I do not think this possible. $;
it was rather  surprising that AM  managed  to
discover ⊗4natural⊗* numbers!  Even if  it hadn't done that, it would  be
acceptable  if AM  had simply  gone off  and developed  ideas in  set
theory.
.SSEC(The need for an agenda)

<<  This  section's  heart  is in  the  right  place,  but  it  needs
rewriting!! >

This  section  provides  motivation for  the  following  one, but  is
virtually content-free  and  may be  skipped  if  the reader  is  not
interested in the origins of the agenda scheme.

In a standard heuristic search,  a particular operation is applied to
a  particular node, and new  nodes result.  For  exmaple, in chess, a
"node" might be  a board configuration,  and an "operation" might  be
castling.   In such  searches, the  action "create a  new node  X" is
trivial.  It may eat up some memory space (e.g., storing a new  board
position), but growing a  new node rarely requires much  "work", much
processing time.

In  scientific  concept  formation,  viewed  as  a  heuristic  search
process, the situation  is quite  different, however. A  "node" is  a
highly-strucured concept. We may wish to  create it after we find out
just a  couple of its facets (e.g., its definition and one inteesting
conjecture  involving  it).    At  that  moment,  we   would  have  a
partially-filled-out new  node. A few facets would  have some entries
in them, but most facets would be blank.  By the usual definition  of
"creating a node", we would have to face,  a this stage, about twenty
tasks,  many of them time-consuming  (e.g., fill in  some examples of
this concept,  fill in  some generalizations  of it,...).   For  each
blank facet, some time would have to be spent filling it in.

But most  of these activities  are not cost-effective. Too  much time
would  be spent, for  too little reason.   Clearly, we  don't want to
generalize and specialize ⊗4every⊗* new concept$$ This  would in fact
cause  an infinite regress:  When concept  X is  to created,  we must
first find new concepts which are generalizations of X; but to create
such   new  concepts,   we  must   first   find  generalizations   of
↓_them_↓,...$.   As soon  as examples of  a concept X  are filled in,
some pattern may be perceived.  At that moment, it is clearly  better
to go  off and explore  that pattern,  than to blithely  continue on,
filling  in all the other  facets of X.  As  an extreme case, suppose
⊗4no⊗* examples  of X were  found. Then  X might  be vacuous, and  it
would be absurd to spend  time trying to specialize it.  On the other
hand, there would then be good reason to try to ⊗4generalize⊗* X.  In
fact, perhaps one of the  best ways for AM to spend its  time at that
moment  would  be to  progressively  generalize X  until  one  of its
generalizations ⊗4did⊗* have some examples.

In other words, it seems reasonable that sometimes we'd like  to quit
working on "fleshing out" the blank  facets of a concept, in order to
explore  some particular  new concept,  or in  order to  perform some
specific task.   In each case, some  good reasons will have  appeared
which  induce  us  to  break  away from  the  plodding  "filling-out"
activities we were  just engaged  in.  Specific  reasons always  have
priority over general inertia (focus of attention).

Computer science provides names for several schemes for accomplishing
this redirection of the flow of control:

.B04

(i)  We  could   let  the  reasons  act  like  little  programs,  and
⊗4↓_interrupt_↓⊗* each  other.   When  a reason  for doing  X  gained
control, the task X  would be started.  But then how  do we cope with
several  reasons pro and con  a given activity?   Also, when one task
has been done, how can we be sure where the best place to "return" to
is?

.OO

(ii) We could consider that  we have one central program, but that it
can ⊗4↓_recurse_↓⊗* whenever a better  task becomes evident. One  can
then  imagine  that  the  control-stack  would   contain  a  list  of
interrupted  tasks,  getting  more and  more  worthwhile  (better and
better reasons) as we proceed toward the current moment.  The current
task can't be interrupted, except for  the sake of performing an even
⊗4more⊗* interesting  task.  If the current task finishes, the normal
recursion process will  take us back  to finish the  last task.   But
recent discoveries may  dramatically effect the worthwhileness of the
activities on the control stack; how  could we reorder them?   Within
the standard function-calling scheme, this can't be done.

.OO

(iii) It sounds like we might need  to have a ⊗4↓_process_↓⊗* scheme,
involving many active  tasks, which can go to sleep and wake up under
certain conditions.  Recall that most of the reasons will  not really
be in  support of  a task  the size of  "fill out  a concept  X", but
rather  one of the size "fill  out facet F of  concept X".  But these
tasks only  use  up  a few  seconds  each  (say averaging  around  15
seconds).  Is  it really worth it to worry  about interrupting one of
these tasks, once it's started?   Is it worth using a hundred  memory
cells to store the state of process that would only use up 5 more cpu
seconds if  we let it wake up and continue?   What if we have to keep
around a thousand partially-complete tasks?  Probably not worth it.

.OO

(iv) Most of the tasks we have hanging around are  fairly independent
of  each  other,  so  perhaps  we   could  exploit  the  power  of  a
⊗4↓_multi-processor_↓⊗*.   But we have thousands of little tasks, and
they spawn  new ones. Well,  if we only  could run  the program on  a
10,000-processo machine (like the proposed Hypercube machine of <<get
this reference>)... Well,  when one  of these  is built  -- and  good
systems software  exists for  it --  this may  be worth  considering.
Even  so, the reults of  experiment 2 (see Sec.  5.3) indicate that a
very important mechanism  is the one  whereby highly-rated new  tasks
get suggested dynamicly, while the current task is executing.  So the
gain  in  speed would  only be  a small  factor  (maybe one  order of
magnitude -- not  three).  A much  more standard criticism of  such a
scheme is that  it only chops the time by  a contant amount (at best,
10,000), wheras the tasks grow exponentially.

.OO

(v) The most obvious scheme is  to just ⊗4↓_execute them all_↓⊗*,  in
some  order.   Even  though we  may  as well  look  on each  task  as
indivisible, we simply can't afford to spend 15 cpu seconds executing
each one:$$ Please forgive the pragmatic discussion that is  about to
occur, but the reader probably will sympathize with the need to worry
about cpu  time and space.  $ that would use up 100,000 seconds (days
of cpu time). For each new concept created, about  20 new tasks would
be proposed.  The process of  "adding a node" would thus use up 5 cpu
minutes of time.

.OO

(vi) Well, what do computers do when  they have lots of tasks to  do,
of varying priority?   Often, they make out an agenda,  a schedule of
things to  do. The most important tasks are  tackled first, and so on
down the line. Typically there will be a ⊗4↓_dynamic scheduler_↓⊗* to
pick the next  task at each moment.   Maybe our system  should do the
same thing, namely schedule all the tasks, to choose them in order of
decreasing estimated  worth.  That is,  pick the  one  with the  best
reasons supporting  it, and execute  it.  Repeat  this Select-Execute
procedure over  and  over  again.   If  a  new,  valuable  task  gets
suggested, it can be merged into the agenda.  If the new task is even
more valuable  than the current one,  so what?!  This  will be a rare
occurrence, and anyway the current task will probably be  nearly done
(so we give up an extra few seconds here and there).

.END

This explains  the choice of an  agenda as the control  mechamism for
AM.    We  have  glossed  over  the  interesting  problem  of how  to
intelligently choose which task is the best one to execute next.  The
details of how that scheme works is described in the next section.

.SSEC(The Agenda)

.AGENDASEC: SECNUM 

.AGENDASSEC: SSECNUM 

.AGENDA:

The time has come to  discuss the agenda mechanism that AM  uses.  At
any  moment, AM has many  (perhaps hundreds or  even thousands) tasks
which have been suggested for some reason or other, but  haven't been
carried out yet.  Each task is at  the level of filling  in a certain
facet  of a certain  concept. Recall  that each task  also has tacked
onto it a list of  symbolic reasons explaining why the task  is worth
doing.   In addition, a  number (between 0  and 1000) is  attached to
each reason, representing some absolute measure of the value of  that
reason (at the moment).  One global formula combines all the reasons'
values  into a  single priority  value for  the task  as a  whole. To
reiterate, a task's priority  is based on  the worths of the  reasons
attached to the  task.  So a  typical entry on the  agenda might look
like this:

.BEGIN NARROW 0,10
.BBOX
MBOX TASK: Fill-in examples of Sets $
MBOX PRIORITY: 300 $
MBOX REASONS:  $
MBOX		100: No known examples for Sets so far. $
MBOX		100: Failed to fillin examples of Set-union, for lack of examples of Sets $
MBOX 		200: Focus of attention: AM recently worked on the concept of Set-union $
.EBOX
.END

<<This paragraph is out of place, or not written well: >

The three  reasons each  have a  fairly low  priority, and the  total
priority of  the task is therefore not  great. It is, however, higher
than any single  reason.  This  is because  there are three  separate
reasons supporting  it.   The global  formula uniting  these reasons'
values must  not simply take the largest of them (ignoring the rest),
nor should  it simply  add them  up  (a few  mediocre reasons  aren't
really better than one excellent reason).  We'll return to this issue
in a moment.

Notice the similarity of this to the initial few lines which AM types
just after it selects a job to work on:

.BEGIN NOFILL PREFACE 0 TURN OFF "{}" TURN ON "↑↓\" TABS 18,21 SELECT 3

***Task 2.
Filling in examples of the following concept: "Sets".

	3 Reasons:\(1) No known examples for Sets so far.
\(2) Failed to fillin examples of Set-union, for lack of examples of Sets.
\(3) Focus of attention: AM recently worked on Set-union.

.E

The flow of control is  quite trivial at the top level:  AM picks the
task with the highest  priority value, and tries to execute it.  As a
side effect, new entries may get  added to the agenda while the  task
is  being executed.    The global  priority value  of  the task  also
indicates how much time and space this task deserves. The sample task
above might rate 20 cpu  seconds, and 200 list cells. When  either of
these resource quanta is used up, AM terminates work on the task, and
proceeds to pick a new  one.  In the case  of filling in examples  of
sets, the space allowed will be used up quickly.

The big  question  now is  "How does  AM  execute a  task, once  it's
chosen?" Several smaller issues also must be discussed:

.BN

λλExactly what  can be done during a  task's execution, and what must
be formulated as a new task to be executed sometime in the future?

λλHow do plausible new tasks get suggested?

λλWho thinks up the reasons supporting a task, and how does he do it?

λλWho evaluates each reason, assigning it an absolute numeric rating?
How?

λλDo these ratings change? When and how?

λλWho  combines the  reasons'  values,  and comes  up  with a  single
priority value for the task? How?

λλDoes a task's priority value change? When and how?

λλWhat are the dangers -- philosophical and empirical -- of relyig on
absolute numeric ratings?

λλHow "tuned" is the system to depending on particular numeric values
and formulae? How fragile/robust is it?

.END

.COMMENT When DIFSECNUM... is available, ONCE TURN ON "{}" here;

These will all be discussed in the next few sections, and illustrated
in the next chapter. More discussion of difficulties and  limitations
of  these  ideas  can  be  found   in  Section  {[2]  DIFSECNUM}.{[1]
DIFSSECNUM}, on page {[3] DIFSEC}.
.SSEC(Executing a chosen task)

Okay, so we've selected a task to execute
-- say "Fill-in Examples of Primes".
How exactly do we do this?

The answer is: "We select relevant heuristics, and execute them."
This really justs splits our original question into two new ones:
(i) How are the relevant heuristics selected, and (2) What does it mean for
heuristics to be executed (e.g., how does executing a heuristic rule
help to fill in examples of primes?).

.SSSEC(Executing heuristic rules)

The  next section  will  describe  how  the relevant  heuristics  are
gathered  up in  an efficient  manner. For  the moment,  let's assume
we've done that.  What do  the heuristics look like, and how do  they
carry out the desired task?

A typical heuristic, attached to the concept Operation, says:

.GROUP B816

If the task is to fill in examples of the operation F,

One way to get  them is to run F on randomaly  chosen examples of the
domain of F.

.E


Of course,  in the LISP implementation, this situation-action rule is
not coded quite so neatly. It would be more  faithfully translated as
follows:

.B816

If  CURRENT-TASK  matches  (FILLIN  EXS  F←anything)), and  F  is  an
Operation,

Then carry out the following procedure:

.INDENT 12,16,0 PREFACE 0 BNN←0

1. Find the domain of F, and call it D;

2.  Find examples of D, and call them E;

3.  Find an algorithm to compute F, and call it A;

4.  Repeatedly:

.INDENT 16,16,0

4a. Choose any member of E, and call it E1.

4b. Run A on E1, and call the result X.

4c. Check that X satisfies the definition of F.

4d. Add X to the Examples facet of F

.E APART

In this case,  we see  clearly that a  heuristic rule  really can  be
"executable", and  that executing it  really can produce  the desired
kinds  of entities.   The  heuristic  is seen  to be  a goal-directed
(data-directed, pattern-directed, etc.) function call.

Well, this is all fine, but we'll have hundreds -- maybe thousands --
of heuristics;  we can't afford  to run  the If-parts (the  left hand
sides, the  situation-test, etc.) of each one each time a new task is
chosen.  The next subsection  discusses how AM is able to zero  in on
the relevant heuristic rules.

.SSSEC(Gathering the relevant heuristics)

Recall that there is are pointer structures with labels like Genl, Spec,
Exs, Up. For example, the concept Primes might point to the concept
Numbers with a link labelled Genl, and Numbers would point to Primes
with an arc labelled Spec, since Primes are special kinds of Numbers.

By calling for Primes.Genl, we might recover the list (Numbers).
By calling for Numbers.Genl, we might recover the list (Object).
Repeating this process, we eventually will reach some very general
concept like "Anything". If we join together all the lists we find along the
way, we could be said to have ⊗4rippled⊗* away from Primes, in the Genl direction.

For example, here is a diagram which represents rippling away from the concept
Compose-Intersect&Union. Diagonal lines are Genl/Spec, and vertical lines are
Exs/Up. Near the top, many concepts are connected by both kinds of links.

.B0

 Anything
    \  ~
     \ ~
      Any-concept
       \  ~
        \ ~
        Any-activity
          \  ~
           \ ~
	  Operation
           / \  ~
          /   \ ~
         /    Composition
        /       \
       /         \
Dom⊗1=⊗*Range-op     Compose-op-⊗1&⊗*-itself  
       ~          ~			 
       ~          ~			 
    Compose-Intersect⊗1&⊗*Intersect

.E

<<This para -- and section -- must be rewritten: >

In case you're interested, a concept A is a ⊗4specialization⊗* of a concept
B iff A could be defined as "B.DEFN and...", where B.DEFN is some definition
of B, and the ellipsis can be filled in with any predicate.
Concept A is an ⊗4exmmple⊗* of  concept
B iff A satisfies some definition
of B. So these two ideas are not mutually exclusive.
"Composition" is a valid example of an operation, since it satisfies the definition
of Operation. On the other hand, Compositions form a subclass of all operations;
to be a composition, X must be an operation and also must staisfy some stronger
requirements. So Composition is also a specialization of Operation.

The important idea here is to see that the bottom concept is connected to the
other ones in a natural chain-like way. You could start at the bottom and
"ripple" your way upward, by following the Up and Genl links.

Now each concept has facets which contain some heuristics. Some of these are for
filling in, some for checking, some for deciding the interestingness, etc.
For example, one Interest feature of Compose says that a composition is
more interesting if the range of the first argument equals the domain of the
second, and that a composition is meaningless if those two sets don't even
intersect.

Suppose we want to judge the interestingness of the bottom concept. We ripple
upwards, gathering heuristics as we go. Several of these have to do with why
a Composition of any kind might be interesting; several have to do with why
any operation might be intresting, etc.

Or, suppose we want to fill in examples of that bottom concept. Again we
ripple upwards. Compose explains how to use examples of its arguments to
find examples of the composition. Operation explains how to "run" the concept
on randomly chosen domain elements, to derive examples that way.
Anyb explains how to instantiate the definition of the concept, to symbolicly
construct examples directly from the definitions of Compose-Intersect&Intersect.

In general, the suggestions of the more specific concepts will be better than
those of the general connepts. This is the old generality VS power tradeoff.
So AM begins executing the heuristic rules, rippling upward, and quits after
it's used up its time quantum (its activation energy) for the current task.
If it doesn't make it all the way up to Anything, that's no great loss, since
Anything has such general information that it's practically never practical to
use it. (e.g., to fill in examples of Anything, pick random objects and see if
they satisfy the definition).

These linkages serve another purpose, besides providing an implicit framework
to guide the gathering of heuristic rules. Suppose one rule asks for
examples of Operations, during its course of execution. How can AM quickly
find all examples of operations? Well, a reasonalbe answer is to ripple
downward along the Spec direction for some time (perhaps not at all), then
go down the Exs link (only once), and then ripple along Spec again as far
as you please. Graphically,

.B0

C1
 \
  \
   \	⊗4any number of Spec links⊗*
   ...
     \
      \
      C2
       ~     ⊗4one Exs link⊗*
      C3
       \
	\
	 \	  ⊗4any number of Spec links⊗*
	 ...
	   \
	    \
	    C4

.E

C1 might coincide with C2, and C3 with C4. 
Using the simple diagram above, the only examples of Operation which are
pictured are Compose and Compose-Intersect&Intersect. The latter is
plucked in two distinct ways, in fact.

So the links are used both to guide the gathering of heuristics, in order to
actually excute a task once it's been chosen from th agenda, and also as
efficient ways to gather examples, specializaitons, etc. of a given concept.

.SSEC(Creating New Concepts)

As we've  discussed before, the  "right-hand-side" of  a rule can  do
three kinds of things:

.BN

λλ Fill in some entries for a facet of a concept.

λλ Create a new concept.

λλ Suggest new tasks and add them to the agenda.

.E


Section 3.4  explained how a heuristic rule  might find entries for a
given facet of a given concept. Below, we shall illustrate the second
kind of action a heuristic rule may take: creating a new concept.

.SSSEC(What to do at the moment of creation -- and what to defer)


The reader may be glad to learn  that this is in some sense a trivial
action.   A  concept is  nothing more  than a bunch  of Facet/Entries
pairs (i.e., a bunch  of labels which have some  information attached
to  them).   The typical action  of a  heuristic rule  is to  fill in
entries for a given  facet of a given  concept (see Section 3.4).  So
all we need  do to create a new  concept C is to tag  the appropriate
data  structures (e.g., if  there is a  global list  of all concepts,
then we must  add this  new name, C,  to that list),  and to  suggest
several new jobs (⊗6"Fill-in the  definition of C", "Fill-in examples
of C"⊗*,...).   Idealistically, we need fill in nothing at the moment
of C's birth.

Of course, there are two key reasons for filling in  as many parts as
we can, at the time of creation:

.BN

λλ  Several  pieces of  information  are trivial  to  obtain at  this
moment, but may be hard to reconstruct later (e.g., the reason why  C
was created).

λλ In practice, it is impossible to work with a concept unless enough
is  known about  it to  disambiguate  it from  all others  (e.g., its
definition, or its algorithm,  or an intuition, or several  examples,
etc.).

.E

There are  of course two quite  general reasons for never  filling in
anything without a good reason:

.BN

λλ To fill in any facet of any concept will use up some precious time
and space.

λλ Often  the process of  filling in  a facet  can be explosive,  can
create new  concepts which will  ⊗4also⊗* need that  particular facet
filled in.

.E

So the universal motto of AM is to fill in facets of a new concept if
-- and only if -- that filling-in activity will be nonexplosive, easy
at the moment, and much harder later on.

In  almost all cases,  the following facets  will be  filled in right
away: Definitions,  Algorithms, Domain/range,  Worth, plus  a tie  to
some related concept (e.g., if the new concept is a generalization of
Equality,   then  we   can  trivially  fill   in  an   entry  on  its
Specializations facet: "Equality".)

In almost all cases, the following facets will ⊗4not⊗* be immediately
apparent:  Examples,   Generalizations,  Specializations,  Ties,  and
Interestingness.

So AM will typically fill in the former kinds of facets, and relegate
filling in the latter  facets to separate tasks which it  adds to the
agenda, and which  AM may or may not eventually get around to trying.
Thus the  agenda will  contain several  jobs of  the form  ⊗6"Fill-in
examples of..."⊗*, but very few of the form ⊗6"Fill-in the definition
of..."⊗*.

The  actual algorithm  for doing  this is  quite simple. AM  spends a
small amount of time  trying to fill in  each part of C  mentioned in
the heuristic rule.  AM quits working on a part if some entries don't
start materializing quickly.  Whenever a part is "given up" on, a new
task is added to the agenda, to fill in  that part of the new concept
C.  Also, any facet which is not mentioned in the rule will also have
a new job added  to the agenda, to  fill it in.   The worth of  these
jobs will be fairly low, unless  there is some specific reason to the
contrary. The  precise priority values will depend on the the overall
worth of  the  facet  involved, plus  the  oWorth  facet of  the  new
concept.   The   facet   EXAMPLES   is   rated   much   higher   than
SPECIALIZATIONS, because looking for examples of a concept is often a
good expenditure of time,  producing the raw data on  which empirical
induction thrives.  Specializations of the new  conept would be other
brand new concepts, and  thus filling in that  facet would be a  very
explosive process.

.SSSEC(An illustration: Squaring a number)

Let's take a simple (but not atypical) illustration  of all this.  AM
has  recently discovered the  concept of  multiplication, and decides
that it is very interesting. A heuristic rule exists which says:

.B816

If a  newly-interesting  operation F(x,y)  takes  a  pair of  N's  as
arguments,

Then create  a new  concept, called  F-Itself, taking  just one  N as
argument, defined as F(x,x), with initial worth Worth(F).

.E

In  the case of F =  TIMES, we see that F takes  a pair of numbers as
its arguments,  so the  heuristic rule  would  have AM  create a  new
concept called TIMES-Itself, defined as TIMES-Itself(x) ≡ TIMES(x,x).
That is, the operation of squaring a number.

What would AM  do in  this situation?   The global  list of  concepts
would be  enlarged to include  the new  atom "TIMES-Itself", and  the
facets  of  this  new concept  would  begin  to be  filled  in.   The
following facets would get filled in almost instantly:


.BBOX
MBOX $
MBOX NAME: TIMES-Itself $
MBOX $
MBOX DEFINITIONS: $
MBOX 		ORIGIN: λ(x,y) [TIMES.DEFN(x,x,y)] $
MBOX $
MBOX ALGORITHMS: λ(x) [TIMES.ALG(x,x)] $
MBOX $
MBOX DOMAIN/RANGE: Number α→ Number $
MBOX $
MBOX GENERALIZATIONS: TIMES $
MBOX $
MBOX WORTH: 600 $
MBOX $
.EBOX

The  name,   definition,  worth,   and  domain/range  are   specified
explicitly by the heuristic rule.

The  lambda  expression  stored  under  the  definition facet  is  an
executable LISP predicate, which accepts two arguments and then tests
them to  see whether the second  one is equal to  TIMES-Itself of the
first arguement.  It performs this test by calling upon the predicate
stored under the  definition facet of the  TIMES concept.  A  trivial
transformation  of  this provides  an  algorithm  for computing  this
operation.

The worth of TIMES is 600 at  the moment, and this becomes the  worth
of TIMES-Itself.

TIMES-Itself  is by  definition  a specialization  of  TIMES, so  the
SPECIALIZATIONS  facet of TIMES is  incremented to point  to this new
concept.  Likewise, the  GENERALIZTIONS facet of TIMES-Itself  points
to TIMES.

Note how easy it  was to fill in these facets  now, but how difficult
it might be later on, "out of context". The task of, e.g., filling in
⊗4Specializations⊗* of TIMES-Itself will  be no harder later  on than
it is  right now,  so we may  as well defer  it until there's  a good
reason for it. This task will probably be added to the agenda with so
low a priority that AM  will never get around to it,  unless some new
reasons for it emerge.

The   task  ⊗6"Fill-in   examples  of  TIMES-Itself"⊗*   is  probably
worthwhile doing soon, but  again it won't be any  harder to do at  a
later time than  it is right now.   So it is not done  at the moment;
rather, it is added to the agenda (with a fairly high priority).

Incidentally, the  reader may be interested to know that the next few
tasks selected  would create  the  inverse of  this operation  (i.e.,
integer square-root), and then create  a new kind of number, the ones
which can be  produced by squaring  (i.e., perfect squares).  Perfect
squares  are  worth having  because  Integer-square-root  is  defined
precisely on that set of integers.

.SSSEC(The Intuition for Squaring)

The notion  of  "Intuition" is  so strange  that I  shall take  every
opportunity to  try to explicate it. This  subsection is a digression
to that purpose,  and may therefore be  skipped forever, returned  to
later, or read right now.

Consider  what  happens  when   the  task  ⊗6"Fill-in  intuitions  of
TIMES-Itself"⊗*  gets  chosen.  Rippling  upward,  we  find  that the
intuition for TIMES is a  rectangle (in the plane) built out  of unit
squares, where  the length of  each side of  the rectangle represents
the two arguments to TIMES, and the area of the rectangle (i.e.,  the
rectangle itself)  represents the value  of TIMES.   For the  current
operation, this translates to a rectange whose sides are all of equal
length: i.e., a  square. If AM  is given  this notion explicitly,  it
will actually suggest renaming the TIMES-Itself operation "Square".

Notice  the   nice  intuitive   image  that   this  will   cause  for
square-rooting,  since a number n  is represented as a  pile of n 1x1
squares. To take  the square-root of a  number, we must arrange  that
pile into one large square, and then see how long one side is.

This also suggests several trivial relationships to verify. Since the
number 1 is intuited as a  1x1 square, it seems intuitively clear  to
AM that  Times-Itself(1) will  be equal  to 1.  This is then  checked
"officially"  (using the  definition  of Times-Itself)  and verified.
Similar considerations suggest that Times-Itself(0) will equal 0, and
that in general Times-Itself(x) is much larger than x.

Although AM never noticed it, this intuition is just the right one to
notice  that to get from  n⊗A2⊗* to n+1⊗A2⊗*, it  is necessary to add
2n+1 unit squares: i.e., that n⊗A2⊗*  + 2n + 1 = n+1⊗A2⊗*. This  is a
far  from  trivial result  to  notice, when  one  doesn't know  about
algebra. In fact, AM couldn't verify such a general conjecture, since
it knows neither the concept of  mathematical induction, nor anything
about proof in general.

.SSEC(Proposing New Tasks)

Sections 3.4 and 3.5 explained how a heuristic rule can fillin entries
for a certain facet of a concept, and how a rule might result in a new
concept being created. This section explicates the third kind of activity
which a heuristic rule can initiate: suggesting new jobs, and adding them
to the agenda.

A few more of the questions raised insection 3.3 can then be answered:

.BEGIN INDENT 4,0,0 PREFACE 0

how do new tasks get suggested?

how do the reasons for the tasks get thought up?

how do those reasons get rated?

.END

<This kind of side-affect is perhaps the SIMPLEST of the 3 kinds that
heuristic rules can produce; perhaps it should be discussed sooner? >

A heuristic rule will have the form

.B816

If ⊗4<some empirical situation>⊗*...

Then ⊗4<some things to do>⊗*...

.E

Among the "things to do" are suggestions for future tasks. These suggestions
are then simply added to the agenda list. That's it!
Each such suggestion consists of two parts: the job itself, plus reasons for
why this job should be done. A typical job has the form ⊗6"Fill in
examples of Perfect-squares"⊗*. Each reason supporting a job contains an
English sentence (an opaque string, a token) and a rating 
(a number between 0 and 1000). One global formula will combine all the reasons'
ratings into one single priority value for the task.

For example, one rule says

.B816

If very few examples of X are found,

Then add  the following task  to the agenda: "Generalize  the concept
X",  for the following reason:  "X's are quite  rare; a slightly less
restrictive concept might be more interesting"; this reason's rating is
the ratio of nonexamples/examples found.

.END

Of course, the situation part (left-hand-side) of the rule would really look
more like this:

.BEGIN SELECT 6 NOFILL PREFACE 0 INDENT 4

If the current task was (Fill-in examples of X),
    and X is a predicate,
    and more than 100 items are known in the domain of X,
    and at least 10 cpu seconds were spent trying to randomly instantiate X,
    and the ratio of successes/failures is both >0 and less than .05
Then...

.E

Even this is one full step above the actual implementation, where
⊗6"X is a predicate"⊗* would be coded as
⊗6"X ε EXS(PREDICATE)"⊗*. The function ⊗6EXS(X)⊗* looks at
the examples facet of its argument X, at specializations of those examples, etc.,
and recursively includes  ⊗6EXS( SPECIALIZATIONS (X) ).

The most suspicious part of this trigger is the number ".05". Where did it come
from? 
It was chosen as a number which is "very much less" than
the "ideal" fraction of examples which should pass a predicate.$$
Ah, well this ideal fraction is clearly at least .1, and at most .9, else
almost every argument is getting the same reply. Well, what is that ideal fraction?
Does it even make sense to say it existss, invariant over all predicates, in all fields
of human endeavor?  Of course not; this leads even to formal contradictions, let
alone aesthetic ones! Well, by symmetry we could settle on 0.5; a figure closer
to 0.1 is
arrived at by introspection (that's right, a gut-reaction guess), and would no
doubt be .11111..., if humans had nine fingers instead of ten.
In any event, we shall say the the ideal predicate returns True about 10% of the
time. Thus  a predicate returning True less than half this often is too "strict".
Hence the .05 $
That remark was of course tongue-in-cheek, and the only honest 
justification for .05 is that you may change it (to .01, or to .25) with virtually
no change in AM's behavior$$ This is the conclusion of experiment 5 (see Section
6.4) $.


The English string is treated simply as a token; that is, AM is not allowed to
inspect parts of it, parse it, transform it, etc. 
The most AM can do is compare two such tokens
for equality. Of course, it would not be too hard to extend this capability to
permit AM to syntactically analyze such strings,
or to trivially compute some sort of "difference" between two given reasons.

The heuristic rule also contains
a little program for computing the numeric worth of this reason (at any given
moment). The sample rule above contains a rather simple
formula for computing the worth of that reason.
In actuality, this would be checked to ensure that the result lies between
0 and 1000. 

It might be advisable to tack this formula onto the agenda list, instead of just
evaluating it once and tacking on the value returned.
The advantage of the latter approach is to save a great deal of space; the
former procedure would be more "intelligent":
Say that dozens of examples of the concept are found in between the time
the task was proposed, and the time AM selects it to execute it.
Those new examples would change the ratio of exs/non-exs, hence the priority
of the reason, hence the priority of the task itself.
An experiment should be done to judge the effect of this on AM's behavior.
Unfortunately, this experiment has not been done, and AM is stuck with the
simpler notion of staticly evaluating the little programs only once, at the time
a task is added to the agenda.

.SSSEC(The Ratings Game)

In general, a heuristic rule can have several reasons in support of a job it
suggests. Each reason has an English phrase and a numeric-valued formula
associated with it. Also, the task may already exist on the agenda, where it:
can have several associated reasons supporting it.

One global formula looks at all the ratings for the reasons, and combines them
into a single priority value for the task as a whole. This oveall rating is
used to choose which task on the agenda to execute next, and how much
time and space to expend on it before quitting.

Below is that formula, although I hesitate to even present it:

.BEGIN NOFILL INDENT 4 SELECT 6 TURN ON "&↑↓"

Worth(J) = ||SQRT(SUM R↓i&↑2)|| ⊗7x⊗* [ .2⊗7x⊗*Worth(A) + .3⊗7x⊗*Worth(F) + .5⊗7x⊗*Worth(C)]

Where J = job to be judged = (Act A, Facet F, Concept C)
     and {R↓i} are the ratings of the reasons supporting J.

.END

For example, consider the job J = (Check examples of Primes).
The act A would be "Check", which has a numeric worth of 100.
The facet F would be "Examples", which has a numeric worth of 700.
The concept C would be "Primes", which at the moment might have Worth of 800.
Say there were four reasons, having values 200, 300, 200, and 500.
The double lines "||...||" indicate normalization, which
means that the final value of the square-root must be between 0 and 1,
which really just means dividing the result of the Square-root by 1000 always.

In this case, we first compute Sqrt(200↑2 + 300↑2 + 200↑2 + 500↑2) = Sqrt(420,000)

which is about 650.   After normalization, this becomes 0.650.
The expression in square brackets is actually computed as the 
dot-product of two vectors,
in this case it is the dot-product of (100 700 800) and (.2 .3 .5), which yields
630. This is multiplied by the normalized
Square-root value, 0.650, and we end up with a
final priority rating of 400.

THe above formula was intened originally as a first pass,
an ad hoc guess, which I expected I'd have to modify later. 
Since it has worked successfully,
I have not messed with it, The conclusion is that the particular form of the
formula is unimportant; only some general characteristics need be present:

.BN

λλ The value is at least as big as the value of the best reason.

λλ The more distinct reasons that exist, the higher the value.

λλ The value isn't increased if the same reason is present more than once.

λλ The value is greatly increased if two good but very different reasons exist.

.E

I believe that all of these criteria are absolutely essential to  good
behavior of the system.
Experiments 1,3, and 5 bear on this question$$ See Section 6.4, page 42 $.

.SSEC(Summary: Agendas and Heuristic Search)

<< Why the agenda mechanism is well-suited to heur search algorithms >

Consider Nilsson's description of depth-first searching, and of
breadth-first searching. He has us maintain a list of "open" nodes.
Repeatedly, we pluck the top one and expand it. In the process,
some new nodes may be added to the Open list. In the case of depth-first
searching, they are added at the top; for breadth-first search, they must be
added at the bottom. For heuristic search, or "best-first" search, they are
evaluated in some numeric way, and then "merged" into the already-sorted
list of Open nodes.

This process is clearly analogous to the agenda mechanism AM uses.
It is also the same as the one used in KRL [reference].
The main difference is that in AM, symbolic reasons are used
(albeit in trivial token-like ways) to decide whether -- and how much --
to boost the priority of a task when it is suggested again.

Agendas are the canonical kind of data structure for a "best-first" process.